Research Statement — Dani Yogatama
نویسنده
چکیده
I design algorithms for intelligent processing of natural language texts—for example, to extract factual information into a structured database (e.g., extracting headquarters locations, CEOs, and phone numbers of companies from text into a database) or to predict real-world events from text (e.g., scientific trends, disease outbreaks). These applications require models of text that scale to large datasets. I advance machine learning (ML) methods for natural language processing (NLP), focusing on large-scale sparse models that leverage expert-informed domain knowledge. In my research, I seek to answer the following questions:
منابع مشابه
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.
متن کاملA Sparse and Adaptive Prior for Time-Dependent Model Parameters
We consider the scenario where the parameters of a probabilistic model are expected to vary over time. We construct a novel prior distribution that promotes sparsity and adapts the strength of correlation between parameters at successive timesteps, based on the data. We derive approximate variational inference procedures for learning and prediction with this prior. We test the approach on two t...
متن کاملEmbedding Methods for Fine Grained Entity Type Classification
We propose a new approach to the task of fine grained entity type classifications based on label embeddings that allows for information sharing among related labels. Specifically, we learn an embedding for each label and each feature such that labels which frequently co-occur are close in the embedded space. We show that it outperforms state-of-the-art methods on two fine grained entity-classif...
متن کاملLinguistic Structured Sparsity in Text Categorization
We introduce three linguistically motivated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in feature weights, enabling us to incorporate prior knowledge into conventional bagof-words models. We show that our structured regularizers consistently improve classification accuracies compared to ...
متن کاملPredicting a Scientific Community's Response to an Article
We consider the problem of predicting measurable responses to scientific articles based primarily on their text content. Specifically, we consider papers in two fields (economics and computational linguistics) and make predictions about downloads and within-community citations. Our approach is based on generalized linear models, allowing interpretability; a novel extension that captures first-o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015